智能论文笔记

Global Counterfactual Explainer for Graph Neural Networks

Mert Kosan , Zexi Huang , Sourav Medya , Sayan Ranu , Ambuj Singh

分类：机器学习

2022-10-21

Graph neural networks (GNNs) find applications in various domains such as computational biology, natural language processing, and computer security. Owing to their popularity, there is an increasing need to explain GNN predictions since GNNs are black-box machine learning models. One way to address this is counterfactual reasoning where the objective is to change the GNN prediction by minimal changes in the input graph. Existing methods for counterfactual explanation of GNNs are limited to instance-specific local reasoning. This approach has two major limitations of not being able to offer global recourse policies and overloading human cognitive ability with too much information. In this work, we study the global explainability of GNNs through global counterfactual reasoning. Specifically, we want to find a small set of representative counterfactual graphs that explains all input graphs. Towards this goal, we propose GCFExplainer, a novel algorithm powered by vertex-reinforced random walks on an edit map of graphs with a greedy summary. Extensive experiments on real graph datasets show that the global explanation from GCFExplainer provides important high-level insights of the model behavior and achieves a 46.9% gain in recourse coverage and a 9.5% reduction in recourse cost compared to the state-of-the-art local counterfactual explainers.

translated by 谷歌翻译

SECP-Net: SE-Connection Pyramid Network of Organ At Risk Segmentation for Nasopharyngeal Carcinoma

Zexi Huang , Lihua Guo , Xin Yang , Sijuan Huang

分类：计算机视觉

2021-12-28

鼻咽癌（NPC）是一种恶性肿瘤。在计算断层扫描（CT）图像的风险（OAR）的准确和自动分割（桨）是临床显着的。近年来，U-Net代表的深度学习模型已广泛应用于医学图像分割任务，这可以帮助医生减少工作量并更快地获得准确的结果。在NPC的OAR分割中，OAR的大小是可变的，特别是其中一些是小的。由于缺乏使用全局和多尺寸信息，传统的深神经网络在分割期间表现不佳。本文提出了一种新的SE连接金字塔网络（SECP-NET）。 SECP-Net提取全局和多尺寸信息流，使用SE连接（SEC）模块和网络的金字塔结构，用于改善分割性能，尤其是小器官。 SECP-NET还设计了一种自动上下文级联网络，以进一步提高分段性能。比较实验在SECP-NET和其他最近方法的与头部和颈部的CT图像上的数据集进行。五倍的交叉验证用于根据两个度量，即骰子和jaccard相似性来评估性能。实验结果表明，SECP-Net可以在这项挑战任务中实现SOTA性能。

translated by 谷歌翻译

Laziness, Barren Plateau, and Noise in Machine Learning

Junyu Liu , Zexi Lin , Liang Jiang

分类：机器学习 | 人工智能 | (统计)机器学习

2022-06-19

我们定义\ emph {laziness}来描述对经典或量子的神经网络变异参数更新的大量抑制。在量子情况下，在随机变分量子电路的量子数中，抑制是指数的。我们讨论了量子机器在梯度下降期间，量子物理学家在\ cite {mcclean2018barren}中创建的量子机学习中的懒惰和\ emph {贫瘠的高原}之间的差异。根据神经切线核的理论，我们解决了对这两种现象的新理论理解。对于无噪声量子电路，如果没有测量噪声，则在过份术的状态下，损耗函数景观是复杂的，具有大量可训练的变异角度。取而代之的是，在优化的随机起点周围，有大量的局部最小值足够好，并且可以最大程度地减少我们仍然具有量子懒惰的均方根损耗函数，但是我们没有贫瘠的高原。但是，在有限的迭代次数中看不到复杂的景观，量子控制和量子传感的精度较低。此外，我们通过假设直观的噪声模型来查看在优化过程中噪声的效果，并表明变异量子算法在过覆盖化方案中是噪声弹性的。我们的工作精确地重新制定了量子贫瘠的高原声明，以对精确声明进行了合理的合理性，并在某些噪声模型中为陈述提供了正当的辩护，将新希望注入了近期变异量子算法，并为经典的机器学习提供了理论上的联系。我们的论文提供了有关量子贫瘠的高原的概念观点，以及关于\ cite {gater}中梯度下降动力学的讨论。

translated by 谷歌翻译

DPCN++: Differentiable Phase Correlation Network for Versatile Pose Registration

Zexi Chen , Yiyi Liao , Haozhe Du , Haodong Zhang , Xuecheng Xu , Haojian Lu , Rong Xiong , Yue Wang

分类：计算机视觉 | 机器人

2022-06-12

姿势注册在视觉和机器人技术中至关重要。本文重点介绍了无初始化姿势注册的挑战性任务，最高为7DOF，用于均质和异质测量。虽然最近基于学习的方法显示了使用可区分求解器的希望，但它们要么依赖于启发式定义的对应关系，要么易于局部最小值。我们提出了一个可区分的相关（DPC）求解器，该求解器是全球收敛性且无对应的。当与简单的特征提取网络结合使用时，我们的一般框架DPCN ++允许使用任意初始化的多功能姿势注册。具体而言，特征提取网络首先从一对均质/异质测量值中学习致密特征网格。然后将这些特征网格转换为基于傅立叶变换和球形径向聚集的翻译和比例不变频谱表示形式，将翻译转换和从旋转中脱钩。接下来，使用DPC求解器在频谱中独立有效地估计旋转，比例和翻译。整个管道都是可区分和训练的端到端。我们评估了DCPN ++在多种注册任务上，以不同的输入方式，包括2D Bird的视图图像，3D对象和场景测量以及医疗图像。实验结果表明，DCPN ++的表现优于经典和基于学习的基础线，尤其是在部分观察到的异质测量方面。

translated by 谷歌翻译

Texture-enhanced Light Field Super-resolution with Spatio-Angular Decomposition Kernels

Zexi Hu , Xiaoming Chen , Henry Wing Fung Yeung , Yuk Ying Chung , Zhibo Chen

分类：计算机视觉

2021-11-07

尽管通过卷积神经网络实现的光场超分辨率（LFSR）的最近进展，但由于4D LF数据的复杂性，灯场（LF）图像的相关信息尚未充分研究和利用。为了应对这种高维LF数据，大多数现有的LFSR方法采用将其分解成较低的尺寸并随后在分解的子空间上执行优化。然而，这些方法本质上是有限的，因为它们被忽略了分解操作的特性，并且仅利用了一组限量的LF子空间，最终未能全面提取时空角度并导致性能瓶颈。为了克服这些限制，在本文中，我们彻底发现了LF分解的潜力，并提出了一种新颖的分解核的概念。特别地，我们系统地将各种子空间的分解操作统一到一系列这样的分解核中，该分解核将其纳入我们所提出的分解内核网络（DKNET），用于全面的时空特征提取。与最先进的方法相比，所提出的DKNET经过实验验证以在2倍，3倍和4倍LFSR尺度中达到大量改进。为了进一步完善DKNet，在生产更多视觉上令人愉悦的LFSR结果，我们提出了一个LFVGG丢失来引导纹理增强的DKNet（TE-DKNet）来产生丰富的真实纹理，并显着提高LF图像的视觉质量。我们还通过利用LF材料识别来旨在客观地评估LFVGG损失所带来的感知增强的间接评估度量。

translated by 谷歌翻译

Learning Interpretable BEV Based VIO without Deep Neural Networks

Zexi Chen , Haozhe Du , Xuecheng Xu , Rong Xiong , Yiyi Liao , Yue Wang

分类：机器人 | 计算机视觉

2021-09-25

单眼视觉惯性进程（VIO）是机器人和自主驾驶中的关键问题。传统方法基于过滤或优化解决了此问题。在完全可解释的同时，他们依靠手动干扰和经验参数调整。另一方面，基于学习的方法可以进行端到端的培训，但需要大量的培训数据来学习数百万个参数。但是，非解剖和重型模型阻碍了概括能力。在本文中，我们提出了一个完全可解释的，可解释的鸟眼视图（BEV），用于具有本地平面运动的机器人的VIO模型，可以在没有深神经网络的情况下进行训练。具体而言，我们首先采用无知的卡尔曼滤波器作为可区分的层来预测音高和滚动，其中学会了噪声的协方差矩阵以滤除IMU原始数据的噪声。其次，采用了精制的音高和滚动，以使用可区分的摄像头投影来检索每个帧的重力对齐的BEV图像。最后，利用可区分的姿势估计器来估计BEV框架之间的剩余3 DOF姿势：导致5 DOF姿势估计。我们的方法允许学习通过姿势估计损失监督的协方差矩阵，表现出优于经验基准的绩效。关于合成和现实世界数据集的实验结果表明，我们的简单方法与最先进的方法具有竞争力，并在看不见的场景上很好地概括了。

translated by 谷歌翻译

Kinematic Motion Retargeting via Neural Latent Optimization for Learning Sign Language

Haodong Zhang , Weijie Li , Jiangpin Liu , Zexi Chen , Yuxiang Cui , Yue Wang , Rong Xiong

分类：机器人

2021-03-16

从人类演示到机器人的动作重返是一种有效的方法，可以减少机器人编程的专业需求和工作量，但面临着人与机器人之间的差异导致的挑战。基于传统的优化的方法是耗时的，依赖良好的初始化，而最近使用前馈神经网络的研究遭受了不良的通知来看不见的运动。此外，他们忽略了人类骨骼和机器人结构中的拓扑信息。在本文中，我们提出了一种新的神经潜在优化方法来解决这些问题。潜在优化利用解码器来建立潜在空间和机器人运动空间之间的映射。之后，通过寻找最佳潜伏向量，可以获得满足机器人约束的重个结果。随着潜在优化，神经初始化利用编码器来提供更好初始化以更快，更好地收敛优化。人体骨架和机器人结构都被建模为更好地利用拓扑信息的图表。我们对重新靶向中文手语进行实验，涉及两只手臂和两只手，对关节中相对关系的额外要求。实验包括在模拟环境中的yumi，nao和辣椒和现实世界环境中的yumi重新定位各种人类示范。验证了所提出的方法的效率和准确性。

translated by 谷歌翻译

Rethinking Mobile Block for Efficient Neural Models

Jiangning Zhang , Xiangtai Li , Jian Li , Liang Liu , Zhucun Xue , Boshen Zhang , Zhengkai Jiang , Tianxin Huang , Yabiao Wang , Chengjie Wang

分类：计算机视觉

2023-01-03

This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.

translated by 谷歌翻译

PIE-QG: Paraphrased Information Extraction for Unsupervised Question Generation from Small Corpora

Dinesh Nagumothu , Bahadorreza Ofoghi , Guangyan Huang , Peter W. Eklund

分类：自然语言处理 | 人工智能

2023-01-03

Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.

translated by 谷歌翻译

A New Perspective to Boost Vision Transformer for Medical Image Classification

Yuexiang Li , Yawen Huang , Nanjun He , Kai Ma , Yefeng Zheng

分类：计算机视觉 | 人工智能

2023-01-03

Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.

translated by 谷歌翻译